Robust Data Mining: An Integrated Approach
نویسندگان
چکیده
The continuous improvement and application of information system technologies have become widely recognized by the industry as critical for maintaining a competitive advantage in the marketplace (Shin et al., 2006). It is also recognized that improvement and application activities are the most efficient and cost-effective when implemented during an early process/product design stage. Data mining (DM) has emerged as one of the key features of many applications in computer science. Often used as a means for predicting the future directions and extracting hidden limitations and specifications of a product/process, DM involves the use of data analysis (DA) tools to discover previously unknown and valid patterns and relationships from a large database. Most DM methods for factor selection reported in literature may yield a number of factors associated with interesting response factors without providing detailed information, such as relationships between the input factor and response, statistical inferences, and analyses (Yang et al., 2007; Witten & Frank, 2005). Based on this, Gardner and Bieker (Gardner & Bieker, 2000) suggested an alternative DA approach toward resolving semiconductor manufacturing problems in order to determine the significant factors. Furthermore, Su et al. (Su et al., 2005) developed an integrated procedure combining a DM method and Taguchi methods. DA is a term coined to describe the process of sifting through large databases for discovering interesting patterns and relationships. This field spans several disciplines such as databases, machine learning, intelligent information systems, statistics, and expert systems. Two approaches that enable the application of standard machine learning algorithms to large databases are factor selection and sampling. Factor selection is known to be an effective method for reducing dimensionality, removing irrelevant and redundant data, increasing mining accuracy, and improving result comprehensibility (Yu & Liu, 2003). Consequently, factor selection has been a fertile field for research and development since the 1970s and proven to be efficient in removing irrelevant and redundant features, increasing efficiency in mining tasks, improving mining performance like predictive accuracy, and enhancing comprehensibility of the learned results. The factor selection algorithm performs a search through the space of feature subsets (Allen, 1974). In general, two categories of the algorithm have been proposed to resolve the factor selection problem. The first category is based on a filter approach that is independent of the learning algorithms and serves as a filter to sieve out the irrelevant factors. The second category is based on a wrapper approach, which uses an induction algorithm itself as part of the function evaluating the factor subset (Langley, 1994). Since most of the filter methods are based on a heuristic O pe n A cc es s D at ab as e w w w .in te ch w eb .o rg
منابع مشابه
An Integrated DEA and Data Mining Approach for Performance Assessment
This paper presents a data envelopment analysis (DEA) model combined with Bootstrapping to assess performance of one of the Data mining Algorithms. We applied a two-step process for performance productivity analysis of insurance branches within a case study. First, using a DEA model, the study analyzes the productivity of eighteen decision-making units (DMUs). Using a Malmquist index, DEA deter...
متن کاملApplying an integrated fuzzy gray MCDM approach: A case study on mineral processing plant site selection
The accurate selection of a processing plant site can result in decreasing total mining cost. This problem can be solved by multi-criteria decision-making (MCDM) methods. This research introduces a new approach by integrating fuzzy AHP and gray MCDM methods to solve all decision-making problems. The approach is applied in the case of a copper mine area. The critical criteria are considered adja...
متن کاملTesting the Exactitude of Estimation Methods in the Presence of Outliers: An accounting for Robust Kriging
Estimation of gold reserves and resources has been of interest to mining engineers and geologists for ages. The existence of outlier values shows the economic part of the deposits subject to the fact that don’t depend on the human or technical errors. The presence of these high values causes a pseudo dramatically increment in variance estimation of economical blocks when applying conventional m...
متن کاملIntegrated planning for blood platelet production: a robust optimization approach
Perishability of blood products as well as uncertainty in demand amounts complicate the management of blood supply for blood centers. This paper addresses a mixed-integer linear programming model for blood platelets production planning while integrating the processes of blood collection as well as production/testing, inventory control and distribution. Whole blood-derived production methods for...
متن کاملRobust production scheduling in open-pit mining under uncertainty: a box counterpart approach
Open-Pit Production Scheduling (OPPS) problem focuses on determining a block sequencing and scheduling to maximize Net Present Value (NPV) of the venture under constraints. The scheduling model is critically sensitive to the economic value volatility of block, block weight, and operational capacity. In order to deal with the OPPS uncertainties, various approaches can be recommended. Robust opti...
متن کاملApplication of an integrated decision-making approach based on FDAHP and PROMETHEE for selection of optimal coal seam for mechanization; A case study of the Tazareh coal mine complex, Iran
Increasing the production rate and minimizing the related costs, while optimizing the safety measures, are nowadays’ most important tasks in the mining industry. To these ends, mechanization of mines could be applied, which can result in significant cost reductions and higher levels of profitability for underground mines. The potential of a coal mine mechanization depends on some important fact...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012